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IMPROVED APPARATUS AND METHOD FOR 
MULTI-THREADED SIGNAL PROCESSING 

Field of Invention 

Invention relates to electronic data and signal processing, particularly to high- 
5 performance multi -threaded information processing techniques. 

Background of Invention 
Traditional methods for achieving high-performance in computational systems 
for digital information processing have centered around the design of architectures 
that deliver greater levels of parallelism. This is typically achieved via the design of 
10 processors and instruction-set architectures that allow for the exploitation of hardware 
parallelism and software concurrency. 

High-performance is typically defined as the ability to execute a very large 
number of operations per second. This figure of merit is strongly dependent on the 
type of operations, which typically depends on the type of application targeted. 
15 Traditional design of high-performance information processing systems 

usually relies on principles of computer architecture to define several key attributes of 
the processing system: 

• Instruction-set architecture refers to the actual programmer-visible sets of 
instructions, and serves as the boundary between hardware and software. 

20 • Organization refers to high-level aspects of computer design, such as memory 
system, bus structure, and internal CPU design. 

• Hardware refers to specific detailed logic design, circuit implementation, and 
packaging. 
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In order to achieve high-performance, which is an attribute typically required 
in special-purpose processors (i.e., built for special applications), three approaches are 
taken: 

5 (1 ) Instruction-level parallelism: this approach, which exploits 

parallelism in hardware, provides for parallel threads of processing via the 
use of a very long or vectorized instruction word, whose fields can be 
decomposed into concurrent processing threads. The mechanism to exploit 
this parallelism may be realized via a scheduler, which schedules 

10 operations onto one of several datapath processing units. This scheme has 

many drawbacks, including the difficulty of building the scheduler and 
identifying enough parallelism to achieve desired throughput. 
(2) Superscalar techniques: this approach exploits fine-grain highly- 

pipelined, single-threaded processor architectures to achieve high 

15 performance. This scheme may achieve very high performance, but only 

for a small class of operations. For operations not well-matched to a 
particular datapath architecture, performance of superscalar design is 
reduced significantly. Thus, the superscalar approach is unsuitable for 
wide-ranging applications with high signal-processing content. 

20 (3) Memory hierarchy techniques: to hide latency of memory accesses 

to slower memories, memory hierarchy techniques have been used 
extensively, especially in microprocessor designs, to increase overall 
system performance by intelligently using fast memories, i.e., caches, 
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between the processor units and slower memory effectively to hide latency 
of slower memory. 
Conventionally, multi -processor systems may employ multi-threaded 
processing to improve compute performance. Multi-threading generally is a known 
5 approach for enhancing compute resource utility, and thus, overall processing 
performance. However, ordinary multi-threaded processing solutions are 
implemented using complex distributed or networked computer nodes, which are 
often not easily reconfigurable at lower logic or circuit level, nor contemplated for 
addressing advanced functional problem sets, such as multi-mode telecommunications 
10 algorithms or networking protocols. Accordingly, there is a need for improved multi- 
thread processing solution. 

Summary of Invention 
Invention resides in design and implementation methodology, processor 

15 architecture, and system for processing multi-threaded digital information (signal or 
data representation) to improve functional performance. Preferably, general system 
design or functional definition, algorithm, electronic signal, or data file is provided 
initially to include one or more multi-threaded representation. Such initial prototype 
design or function may then be profiled or otherwise characterized for parallel or 

20 effectively similar processing, in particular, in order functionally to use or otherwise 
be implemented in one or more corresponding fixed, parameterizable, programmable, 
or configurable logic units or other equivalent functional signal-processing kernel or 
element, using temporal and/or non-temporal functional considerations. 
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Preferably, relatively complex system functionality, such as for application to 
digital communications and/or networking and/or media processing system design, is 
analyzed according to pre-specified system design rules, mathematical operations, 
sequences of operations, or parameters, and then symbolically or schematically 

5 represented to identify one or more algorithms, specific sequences of operations, 
patterns of memory accesses, or segments (i.e., single or multi-"threads"), which may 
each be profiled, structured, or otherwise characterized for optimized operation or 
implementation using one or more particular fixed, parameterizable, programmable, 
or configurable logic unit or kernel elements. Such element is built by providing a 

10 datapath, whose structure and configurability is determined via profiling, a 

sequencer/finite-state-machine, whose structure and configurability is determined via 
profiling, and local memory, whose structure is determined via profiling memory 
accesses and using locality to derive local memory properties. Optionally, one or 
more kernel elements are implemented entirely in software or programmable logic, or 

15 combination thereof. Further, as described herein, term "profiling" refers generally to 
automated and/or manual processing of one or more system or function modules to 
define one or more configurable structures associated with each module. 



Brief Description of Drawings 
20 FIG. 1 is a general methodology and tool architecture diagram for 

implementing in software and/or hardware a preferred embodiment of the present 
invention. 

FIGs. 2A-B are functional block diagrams for implementing one aspect of the 
present invention. 

4 
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FIG. 3 is a representative functional diagram illustrating heterogeneous aspect 

of the present invention. 

FIG. 4 is a representative functional diagram illustrating reconfigurable aspect 

of the present invention. 
5 FIG. 5 is a representative functional diagram illustrating kernel aspect of the 

present invention. 

FIG. 6 is a representative functional diagram illustrating interface aspect of the 
present invention. 

FIG. 7 is a system methodology flow chart showing functional operations for 
10 implementing one or more aspects of the present invention. 

FIG. 8 is representative of software code stubs for implementing one or more 
aspects of the present invention. 

FIG. 9A-B are representative functional diagrams of one or more applications 
of present invention. 

15 

Detailed Description of Preferred Embodiment 
Present innovation enables automated design and implementation to process 
single or multi-threaded or equivalently partitioned processing of digital data, signals, 
or functional representation for improved processing performance. Initially, system 
20 design or functional definition, algorithm, electronic signal, or data file provides 

certain single or multi-threaded representation, whereupon one or more system design 
or function modules are profiled, structured, or otherwise characterized for parallel or 
concurrent processing. 
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For example, multi-threaded prototype may be used or otherwise be 
implemented in fixed, parameterizable, programmable, or configurable logic unit or 
other signal-processing kernel or element. Hence, complex system functionality, such 
as digital communication, networking, or multi-media application, may be analyzed 
5 per system design rules, mathematical operations, sequences of operations, or 

parameters, then symbolically or schematically represented to identify certain single or 
multi-thread algorithms, specific sequences of operations, patterns of memory 
accesses, or segments, each thread being profiled or characterized to optimize 
operation or implementation using fixed, parameterizable, programmable, or 

10 configurable logic unit or kernel element. 

Optionally, datapath structure is configured into single or multi-thread 
element, as determined by profiling, a sequencer and/or equivalent finite-state- 
machine, whose structure and configurability is determined by profiling, and local 
memory, whose structure is determined by profiling memory accesses and locality to 

1 5 derive memory properties. 

As used herein, profiling terminology is understood to refer generally to any 
computer-automated and/or manual processing, interpretation, or classification of one 
or more system or function modules to define or categorize one or more configurable 
structures associated with each module, e.g., by selecting or assigning one or more 

20 functional elements or design objects, such as interconnection, signals, logic, circuits, 
etc. Preferably, profiling is accomplished according to one or more previously and/or 
dynamically defined criteria or functional rule set. 
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Generally, in a computer-automated and/or manual development approach, a 
single or multi-threaded design is processed by providing initially a first-level 
functional definition representing a prototype system, such that an other-level 
functional definition symbolically representing equivalent functionality may be 

5 generated or effectively profiled therefrom. In this hierarchical design scheme, the 
generated symbolic representation may identify certain threads associated with the 
system design, preferably at one or more functional levels. 

Each thread may be profiled for processing by corresponding kernel 
element(s), and one or more common set of operations is identified for given threads, 

10 (e.g., on a 1-to-l, multiple-to-1, or 1-to-multiple thread-to-kemel relationship). Each 
thread may further be mapped to identify the sequence, or scheduling information, for 
each set of operators utilized to implement system or functional modules, such as a 
sequence of arithmetic operations, control operations, and/or memory access 
operations or related memory locations. 

15 Hence, using the present system development methodology, a multi-threaded 

processing architecture may substantially include a set of kernel elements, such that 
one kernel element processes certain function represented by corresponding thread, 
and another kernel element in the same prototype design processes other function 
represented by other corresponding thread. In this partitioned or distributed 

20 processing approach, each thread may be profiled separately or hierarchically for 
appropriate multi-level or functional group processing. For example, a first-level or 
group kernel element and a second-level or group kernel element, respectively are 
associated with a corresponding first thread and second thread in a given function or 
system design. 
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In a representative system design for wireless code division multiple access 
(CDMA) communications application, it is contemplated that various kernels may be 
provided to serve different functional groups, such as: front-end processing (e.g., data 

5 switch selector, sample interpolation, etc.); chip-rate processing (e.g., sample epoch 
selection, matched filter, generic despreader, generic dechannelizer, code generation 
unit, integrate and dump, generic searcher control, etc.); symbol sequence processing 
(e.g., transport format decoder, dynamic spreading factor computer, fast Hadamard 
transform, etc.); channel element processing (e.g., alignment/deskewing, combiner, 

10 soft decision computer, interpath interference equalizer, receive antenna diversity 
combiner, etc.); interleaving (e.g., deinterleaver controller); and channel coding (e.g., 
turbo decoder, convolutional decoder, etc.). 

Generally, present approach enables one or more functional or system designs 
to be implemented efficiently, preferably via current multi-threading scheme, in a 

15 single processor architecture by re-parameterizing, reprogramming, or reconfiguring 
kernel elements (i.e., as determined by profiling technique as described further 
therein,) from which corresponding threads are assembled, and/or by changing 
sequence of operations (i.e., as determined by mapping and/or scheduling) with which 
threads are implemented. Preferred embodiment implements functional or system 

20 design in one or more heterogeneous and reconfigurable logic or kernel elements (i.e., 
according to so-called "DRL" process, as described further herein.) 

FIG. 1 is a general architecture or system block diagram showing top-level 
overview of present design methodology, functional modules, and software and/or 
hardware tool architecture, preferably implemented in one or more electronic design 
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automation platforms, including one or more stand-alone or networked computers, 
processors, engineering workstations, or other compute facility having appropriate 
operating system, user interface, storage management, communications interfaces, and 
other computer-aided design and engineering tools. Preferably, it is contemplated that 

5 present design methodology serves to provide a tool architecture and processor 
implementation and architecture, or data file representative thereof, for enabling 
system architecture, such as network implementation. 

As shown, initially one or more functional definition files 10, such as design 
netlist, or high-level description language (such as C or HDL) defining one or more 

10 functional modules or algorithms 12 is provided manually or computed automatically. 
In accordance with one aspect of present implementation, functionally-selective 
profiling and mapping scheme 14 is processed or applied to primitives 16 and 
functional definitions 10 to generate or provide, particularly on a multi-threaded basis, 
one or more control and communication signals 26 and kernels 1 8. Further, profiling 

15 and mapping 14 provides scheduling data for schedule operation tables 20. Control 
and communication signals are processed according to one or more predefined or 
selected functional rule set or signaling flags, e.g., communication semaphores 24. 
Various kernels 18 are processed and interconnected for implementation 22, for 
example, in reconfigurable form as described herein for multi-threaded signal 

20 processing. 

FIGs. 2A-B functional block diagrams show representative set of kernels 18, 
28 and their physical implementation, including schedule and allocate function 30. 
Preferably, one or more kernel 1 8 is associated with or corresponds to profiled and 



9 



WO 01/55917 PCT/US01/02982 
mapped thread, and is implemented reconfigurably using sequencer 32, datapath 34, 
and memory 36. 

Hence, according to present system and circuit design methodology and/or 
computing apparatus, general functional definition is implementable using single or 

5 multi-threaded representation thereof, which may be profiled effectively for parallel 
processing using one or more corresponding kernel logic elements (e.g., according to 
1-to-multi, l-to-l,multi-to-l or multi-to-multi kernel to thread relationship.) For 
example, communication, networking, or media processing functionality or algorithm 
is functionally analyzed and symbolically represented to identify one or more thread 

10 segments, which are each profiled or otherwise characterized for optimized operation 
or implementation using one or more particularly designated fixed, parameterizable, 
programmable, or reconfigurable logic kernel. 

FIG. 3 functional diagram shows representative heterogeneous, reconfigurable, 
multi-processing arrangement, for example, whereupon kernel 8 may implement 

15 "small" granularity threaded function, and kernel 6 may implement "large" granularity 
threaded function. In this reconfigurable arrangement, various levels of functional 
granularity, which is preferably an attribute of design function and corresponding 
kernel, may be implemented or dynamically reconfigured according to design 
requirement or profile mapping preference. 

20 For further illustration, FIG. 4 functional diagram shows one or more 

representative or available configurable logic or functions which may be employed 
according to present approach for implementing single or multi-threads into 
designated kernels, such as reconfigurable logic or programmable function units 
(PFU) 40 having programmable logic elements and switch matrix (e.g., for encoding 

10 
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bit-level operations), reconflgurable datapaths 42 having multiplexers, registers, 
adders, buffers, etc. and configurable signal flow through these elements (e.g., for 
dedicated datapath filters), reconflgurable arithmetic 44 having address generators, 
memory, memory address control, etc. (e.g., for arithmetic convolution kernels), and 
5 reconflgurable control 46 having data memory, datapath, program memory, 

instruction decoder and controller, etc. (e.g., for real-time operating system process 
management). 

Moreover, as further illustration of sample kernel implementation, FIG. 5 
functional diagram shows preferred functional elements for implementing kernel 18, 

10 including data sequencer 32, data memory 36, and parameterizable configurable 
arithmetic logic unit (ALU) 34. 

FIG. 6 is a representative functional diagram illustrating optional interface 
between dynamically reconflgurable logic (DRL) process 64 and associated 
configuration database for processing functions externally to main processor hardware 

15 model 50. Preferably, DRL process is heterogeneous and reconflgurable, and 

implemented using current innovation. As shown, hardware interfaces 54 couples 
processor element 52 associated with library 62 and specified functional modules 60, 
including processor software model 57 having C-program model 56 and input/output 
device drivers 58 to external DRL process 64. 

20 In this optional embodiment, one or more single or multi-threaded digital 

information (e.g., signal or data representation), such as general system design or 
functional definition, algorithm, electronic signal or data file is provided initially to 
include one or more multi-threaded representation, and such initial prototype design 
or function is profiled or otherwise characterized for parallel or effectively similar 
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processing, in particular, in order functionally to use or otherwise be implemented in 
one or more corresponding fixed, parameterizable, programmable, or configurable 
logic unit or other equivalent functional signal-processing kernel or element in 
processor model 50, 57 for functional cooperation or emulated real-time signal 
5 interaction with external DRL process 64. 

FIG. 7 flow chart shows another aspect of present operational steps. Initially, 
user-generated or computer-generated functions are defined 70 for prototype or other 
system design. Then, one or more mathematical analysis or design performance 
optimization scheme may be applied 72 to initial design definition. Next, one or more 

10 constituent algorithms for design definition is provided 74, and representation of such 
algorithms is thereby coded 76, preferably in high-level, register transfer, or 
behavioral functional format. 

Algorithms may be profiled and mapped 78, or otherwise functionally defined 
or categorized manually and/or automatically for optimized or directed operation or 

15 implementation of system design modules, functions, signals, components, or other 
element thereof using correspondingly defined kernels 80, preferably using one or 
more specified design building-blocks, i.e., primitives 86. Profiling and mapping data 
also are provided for communications semaphores 84 and scheduling and finite state 
machine control and parameters 88. Then, kernel definition 80 and FSM control 

20 parameterization and scheduling 88, as well as communications semaphores 84 are 
applied to implement single or multi-threaded elements of present design into 
processor architecture with reconfigurable kernel elements 82. FIG. 8 shows 
representative software code of sample design indicating usage of multi-thread kernels 
90. 

12 
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In accordance with one aspect of present invention, profiling processing or 
reconfigurable algorithms representative thereof is temporal, thereby including 
determination of certain time value or degree of change over time. Example of 

5 temporal application includes changes in receiver algorithms required in a cellular 
wireless system and any associated signal processing scheme for these algorithms 
which can take advantage of present profiling methodology. In this example, 
whereupon processing throughput requirements in one path (e.g., reception direction) 
may increase or decrease as processing progresses (e.g., from antenna to final 

10 retrieved data representation,) present profiling scheme serves to determine hardware- 
software or other functional partitioning of overall design implementation. 

Further, in such cellular wireless example, it is contemplated that multiple 
methods may perform similar or equivalent signal processing, but result in different 
air-interface requirements or effective functionality. Particularly in the hardware 

15 partition of a given system, various processing forms or functional elements may 
occur or operate at various rates. Because variable processing rates may be required, 
and various modes of operational control may be dictated by support for multiple 
processing streams, several additional non-temporal and temporal profiling techniques 
maybe applied to provide optimal functional flexibility in view of available 

20 operational performance point or capacity of such hardware architecture (e.g., real- 
time and non-real-time profiling). It is contemplated generally herein that other 
examples of application of present innovation may arise additionally with cellular 
wireless, including fixed-wireless, unlicensed wireless LANs, cordless telephony, 
telemetry, and the like. 
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One profiling technique applies to hardware-based algorithms across multiple 
modes of operation to determine type and number of operations and storage elements 
required, thereby enabling designer to classify each temporally-distinct function in a 
5 form which facilitates identification of commonly-used resources. 

Another profiling technique applies for controlling multiple levels of hardware 
definition according to frequency of change, which is required. Here, mode- 
dependent changes in receive path of wireless receiver, for example, may need to 
change at startup for global reconfiguration between transaction configuration (e.g., 
10 where transactions are multi-second transactions), and within sub-second transaction 
across blocks of data (e.g., "on the fly.") 

Depending on profiling results, appropriate level of configurable 
implementation may be selected, such as for processing data at highest data rate 
needing control on per-cycle basis. However, flexibility may be required for control, 
15 and programmable state machine may provide optimal flexibility meeting necessary 
performance requirements. For a datapath which may need to be selected at 
configuration time, but is not changed often, then programmable interconnect may be 
appropriately applied. 

Moreover, if datapath selection occurs real-time, then datapath-cell-based 
20 multiplexing structure may apply. Also, for control functions where operation 

ordering is necessary, then parameterized kernels for processing operations may apply. 
Additionally, in cases of high-performance requirements and low flexibility 
requirements, dedicated datapaths are applicable to optimize silicon implementation. 
In case of multi-standard wireless receiver design, which delivers optimal flexibility 
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relative to performance point, one or more of foregoing profiling techniques are 
applicable. 

FIG. 9A shows general aspects of applying present invention, including flow 
for transferring configuration table 92 of capability, parameters and values according 
5 to one or more industry or proprietary standards through applications programming 
interface (API) 94 to provide one or more configuration parameters for single or 
multi-threaded reconfigurable system implementation according to present scheme, 
e.g., using wired and/or over-the-air wireless network download or other 
transmission/reception. 

10 Preferred implementation receives configuration parameters through API 94 to 

define or implement one or more interconnected block modules 96, representing 
microprocessor, digital signal processor (DSP), application specific integrated circuit 
(ASIC), field programmable gate array (FPGA), DRL, or other functional block 
module, which further may be defined or implemented in one or more interconnected 

15 kernel elements 98. In accordance with one aspect of present invention, one or more 
configurable parameters 100 may be defined or implemented to correspond in 
threaded fashion to one or more specified kernel elements. Hence, in this 
configurable-parameter case, design and implementation method or system serves to 
process multi-threaded digital signal or data for improved functional performance. 

20 Generally, system design or functional definition, algorithm, electronic signal 

or data file is provided to include such multi-threaded representation, and initial 
prototype function is thus profiled for parallel processing by one or more thread, for 
example, to implement certain parameterizable kernel elements, which may be 
constrained temporally. 
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More particularly, in digital wireless communication application, as shown in 
FIG. 9B, portable mobile radio handsets 102 transmit and receive signals wirelessly 
with base station 104, possibly coupled to other handsets 102 and base stations 104 
5 through digital network 106. In this networked application, specified design rules, 
operations, or parameters, as well as any symbolic or schematic representation thereof 
identify or correspond to multi-threads, for profiling and implementation in 
programmable kernels or software modules. 

Optionally, kernel elements may be configured for operation in base station 
10 104 and/or handset units 102. In particular, kernels may be configured for profiled 
datapath, sequencer/finite-state-machine, memory, or other logical structure, possibly 
according to temporal or non-temporal design constraint. 

Foregoing described embodiments of the invention are provided as 
illustrations and descriptions. They are not intended to limit the invention to precise 
15 form described. 

In particular, Applicant contemplates that functional implementation of 
invention described herein may be implemented equivalently in hardware, software, 
firmware, and/or other available functional components or building blocks. Other 
variations and embodiments are possible in light of above teachings, and it is thus 
20 intended that the scope of invention not be limited by this Detailed Description, but 
rather by Claims following. 
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Claims 

What is claimed is: 

1 . In a computer-assisted design system, an automated method for processing 
multi-threaded system functionality, the method comprising the steps of: 
5 providing a first function definition representing a system design; 

generating from the first function definition a second function definition 
representing symbolically the first function definition, such symbolic representation 
identifying one or more thread associated with the system design; and 

profiling each thread for processing by a specified kernel element or set 

10 thereof. 



2. The method of Claim 1 further comprising the steps of: 
identifying a common sequence of operations in a given thread; and 
associating the common sequence of operations with a set of operators. 

3. The method of Claim 2 further comprising the step of: 
associating the set of operators with a sequence of arithmetic operations. 



4. The method of Claim 2 further comprising the step of: 
20 associating the set of operators with a sequence of control operations. 



5. The method of Claim 2 further comprising the step of: 
associating the set of operators with a sequence of memory access operations 
or locations. 

17 
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6. The method of Claim 1 wherein: 

one or more threads is profiled according to a temporal function. 



5 7. Apparatus for multi-threaded processing comprising: 

a first kernel element; and a second kernel element; 

wherein the first kernel element processes a first function represented by a first 
thread, the second kernel element processes a second function represented by a second 
thread, the first thread and the second thread each being profiled for processing 
10 respectively by the first kernel element and the second kernel element, and the first 
thread and the second thread being associated with a common function. 



8. The apparatus of Claim 7 wherein: 

a common sequence of operations is identifiable with a given thread, 
15 the common sequence of operations being associated with a set of operators. 

9. The apparatus of Claim 8 wherein: 

the set of operators is associated with a sequence of arithmetic, control, or 
memory access operations. 

20 

1 0. The apparatus of Claim 7 wherein: 

the first or second thread is profiled according to a temporal constraint. 



11. 



The apparatus of Claim 7 wherein: 

18 
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the first and second kernel elements are implemented as one or more executable 
software modules. 



12. The apparatus of Claim 7 wherein: 

the first and second kernel elements are implemented as one or more 
functional modules in a fixed base station or a mobile handset of a radio 
communication system. 

13. In a communication system comprising a base station and one or more 
portable units, wherein each portable unit may communicate wirelessly through radio 
signals with the base station, a method for signal processing comprising the step of: 

generating by a base station a first signal representing a system configuration, 
the first signal representing symbolically one or more function definition associated 
with one or more thread in the system configuration, wherein each thread is profiled 
for processing by a specified kernel element in a portable unit. 

14. The method of Claim 13 further comprising the step of: 

receiving the first signal by the portable unit, one or more kernel element in 
the portable unit being configured to process one or more thread in the system 
design according to the first signal. 

15. The method of Claim 13 wherein: 

one or more thread is profiled according to a temporal functional constraint. 
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